NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Methodology for Adaptive Active Message Coalescing in Task Based Runtime Systems

https://doi.org/10.1109/IPDPSW.2018.00173

Wagle, Bibek; Kellar, Samuel; Serio, Adrian; Kaiser, Hartmut (May 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW))

Overheads associated with fine grained communication in task based runtime systems are one of the major bottlenecks that limit the performance of distributed applications. In this research, we provide methodology and metrics for analyzing network overheads using the introspection capabilities of HPX, a task based runtime system. We demonstrate that our metrics show a strong correlation with the overall runtime of our test applications. Our aim is to eventually use these metrics to tune, at runtime, parameters relating to active message coalescing. This method improves on the postmortem analysis techniques that are currently employed to tune network settings in distributed applications.
more » « less
Full Text Available
Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems

https://doi.org/10.1145/3337821.3337915

Wagle, Bibek; Monil, Mohammad Alaul; Huck, Kevin; Malony, Allen D.; Serio, Adrian; Kaiser, Hartmut (January 2019, ICPP 2019 Proceedings of the 48th International Conference on Parallel Processing)

As the era of high frequency, single core processors have come to a close, the new paradigm of many core processors has come to dominate. In response to these systems, asynchronous multitasking runtime systems have been developed as a promising solution to efficiently utilize these newly available hardware. Asynchronous multitasking runtime systems work by dividing a problem into a large number of fine grained tasks. However, as the number of tasks created increase, the overheads associated with task creation and management cannot be ignored. Task inlining, a method where the parent thread consumes a child thread, enables the runtime system to achieve the balance between parallelism and its overhead. As largely impacted by different processor architectures, the decision of task inlining is dynamic in nature. In this research, we present adaptive techniques for deciding, at runtime, whether a particular task should be inlined or not. We present two policies, a baseline policy that makes inlining decision based on a fixed threshold and an adaptive policy which decides the threshold dynamically at runtime. We also evaluate and justify the performance of these policies on different processor architectures. To the best of our knowledge, this is the first study of the impacts of adaptive policy at runtime for task inlining in an asynchronous multitasking runtime system on different processor architectures. From experimentation, we find that the baseline policy improves the execution time from 7.61% to 54.09%. Furthermore, the adaptive policy improves over the baseline policy by up to 74%.
more » « less
Full Text Available
Asynchronous Execution of Python Code on Task-Based Runtime Systems

https://doi.org/10.1109/ESPM2.2018.00009

Tohid, R.; Wagle, Bibek; Shirzad, Shahrzad; Diehl, Patrick; Serio, Adrian; Kheirkhahan, Alireza; Amini, Parsa; Williams, Katy; Isaacs, Kate; Huck, Kevin; et al (November 2018, 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2))

Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards.
more » « less
Full Text Available
A Massively Parallel Distributed N-body Application Implemented with HPX

https://doi.org/10.1109/ScalA.2016.012

Khatami, Zahra; Kaiser, Hartmut; Grubel, Patricia; Serio, Adrian; Ramanujam, J. (November 2016, 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA16))

One of the major challenges in parallelization is the difficulty of improving application scalability with conventional techniques. HPX provides efficient scalable parallelism by significantly reducing node starvation and effective latencies while controlling the overheads. In this paper, we present a new highly scalable parallel distributed N-Body application using a future-based algorithm, which is implemented with HPX. The main difference between this algorithm and prior art is that a future-based request buffer is used between different nodes and along each spatial direction to send/receive data to/from the remote nodes, which helps removing synchronization barriers. HPX provides an asynchronous programming model which results in improving the parallel performance. The results of using HPX for parallelizing Octree construction on one node and the force computation on the distributed nodes show the scalability improvement on an average by about 45% compared to an equivalent OpenMP implementation and 28% compared to a hybrid implementation (MPI+OpenMP) [1] respectively for one billion particles running on up to 128 nodes with 20 cores per each.
more » « less
Full Text Available
Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars

https://doi.org/10.1177/1094342018819744

Heller, Thomas; Lelbach, Bryce_Adelstein; Huck, Kevin_A; Biddiscombe, John; Grubel, Patricia; Koniges, Alice_E; Kretz, Matthias; Marcello, Dominic; Pfander, David; Serio, Adrian; et al (February 2019, The International Journal of High Performance Computing Applications)

We present a highly scalable demonstration of a portable asynchronous many-task programming model and runtime system applied to a grid-based adaptive mesh refinement hydrodynamic simulation of a double white dwarf merger with 14 levels of refinement that spans 17 orders of magnitude in astrophysical densities. The code uses the portable C++ parallel programming model that is embodied in the HPX library and being incorporated into the ISO C++ standard. The model represents a significant shift from existing bulk synchronous parallel programming models under consideration for exascale systems. Through the use of the Futurization technique, seemingly sequential code is transformed into wait-free asynchronous tasks. We demonstrate the potential of our model by showing results from strong scaling runs on National Energy Research Scientific Computing Center’s Cori system (658,784 Intel Knight’s Landing cores) that achieve a parallel efficiency of 96.8% using billions of asynchronous tasks.
more » « less

Search for: All records